Suffix Stripping Problem as an Optimization Problem
نویسندگان
چکیده
Stemming or suffix stripping, an important part of the modern Information Retrieval systems, is to find the root word (stem) out of a given cluster of words. Existing algorithms targeting this problem have been developed in a haphazard manner. In this work, we model this problem as an optimization problem. An Integer Program is being developed to overcome the shortcomings of the existing approaches. The sample results of the proposed method are also being compared with an established technique in the field for English language. An AMPL code for the same IP has also been given.
منابع مشابه
Enhanced Confix Stripping Stemmer and Ants Algorithm for Classifying News Document in Indonesian Language
Ants algorithm is a universal and flexible solution which was first designed for solving optimization problem such as Traveling Salesman Problem. Analogy between finding the shortest way by ants and finding documents most alike, became a stimulus of ant based text document clustering method. This method consist of two phases, which are finding documents most alike (trial phase) and clusters mak...
متن کاملA Continuous Optimization Model for Partial Digest Problem
The pupose of this paper is modeling of Partial Digest Problem (PDP) as a mathematical programming problem. In this paper we present a new viewpoint of PDP. We formulate the PDP as a continuous optimization problem and develope a method to solve this problem. Finally we constract a linear programming model for the problem with an additional constraint. This later model can be solved by the simp...
متن کاملBi-objective optimization of multi-server intermodal hub-location-allocation problem in congested systems: modeling and solution
A new multi-objective intermodal hub-location-allocation problem is modeled in this paper in which both the origin and the destination hub facilities are modeled as an M/M/m queuing system. The problem is being formulated as a constrained bi-objective optimization model to minimize the total costs as well as minimizing the total system time. A small-size problem is solved on the GAMS software t...
متن کاملOverlay Problems for Music and Combinatorics
Motivated by the identification of the musical structure of pop songs, we introduce combinatorial problems involving overlays (non-overlapping substrings) and the covering of a text t by them. We present 4 problems and suggest solutions based on string pattern matching techniques. We show that decision problems of this type can be solved using an Aho-Corasick keyword automaton. We conjecture th...
متن کاملAnunsupervised Approach Todevelop Stemmer
This paper presents an unsupervised approach for the development of a stemmer (For the case of Urdu & Marathi language). Especially, during last few years, a wide range of information in Indian regional languages has been made available on web in the form of e-data. But the access to these data repositories is very low because the efficient search engines/retrieval systems supporting these lang...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1312.6802 شماره
صفحات -
تاریخ انتشار 2013